max rank | avg. rank | sentence |
---|---|---|
89 | 28.1429 | The municipality haes an aurie o convert. |
89 | 37.0000 | The region haes an aurie o aboot convert. |
93 | 29.4444 | It is locatit in the sooth o the province. |
93 | 30.5556 | It is locatit on the sooth o the province. |
108 | 31.4444 | It is locatit tae the wast o the province. |
141 | 36.9000 | It is the lairgest region in the kintra bi population. |
144 | 46.5714 | The municipality covers a aurie o convert. |
144 | 46.2857 | The municipality covers an aurie o convert. |
156 | 32.4444 | The ceety haes the same name o the municipality. |
166 | 53.5714 | It haes a total population o inhabitants. |
179 | 43.0833 | It is ane o the lairgest ceeties in the warld bi aurie. |
193 | 83.4286 | The population in 2002 wis ( 2002 Census ). |
202 | 76.7000 | The ceety can be dividit intae a nummer o auries. |
211 | 86.8571 | The province is dividit intae twa municipalities. |
213 | 43.9091 | It is the seat o the municipality o the same name. |
213 | 39.6923 | It is the seat o the municipality o the same name an aw. |
213 | 37.8750 | The ceety is the seat o the municipality. |
228 | 77.6667 | The municipality wis established in 2007. |
277 | 120.0000 | The province is dividit intae five municipalities. |
284 | 89.0000 | He established the toun center, which is still in place the day. |
284 | 56.6667 | It is locatit in the center o the kintra. |
284 | 61.1250 | It is the admeenistrative center o the destrict. |
298 | 123.5556 | There a nummer o touns locatit athin this aurie. |
298 | 98.6364 | There haes been nae census syne the end o the war. |
323 | 97.4000 | The current toun wis built wast frae the auncient toun. |
350 | 53.3000 | It is the caipital an main toun o the island. |
352 | 62.9000 | It is locatit in the central pairt o the kintra. |
352 | 61.1000 | It is locatit in the central pairt o the state. |
371 | 174.8125 | They hae wan the Soviet Cup 10 times an the Roushie Cup 3 times an aw. |
392 | 98.5000 | It is in the soothren central region o the kintra. |
The maximum word rank of a sentence is by definition the rank of the rarest word in the sentence. If it is low, all words in the sentence are of high frequency. For this reason the table of the sentences with least maximum word number might be of interest. In the table, we see the corresponding sentences with a minimum length of 40 characters.
The over all distribution of the maximum rank in all sentences of the corpus is shown in a diagram with log-scaled x-axis.
The sentences in the table described above are of interest because they are usually easy to understand. The distribution may give insights into the corpus and may give parameters for language comparison.
While the distribution might be deduced from a small corpus, the sentences in the table are rare and a large corpus will give more impressive results.
Table data:
select max(w_id)-100 as m, avg(w_id)-100 as a, s.sentence from sentences s, inv_w i where s.s_id=i.s_id and length(sentence)>40 and i.w_id>100 group by s.s_id order by m limit 30;
Distribution data;
select m, count(*) from (select 100* round((max(w_id)-100)/100) as m from sentences s, inv_w i where s.s_id=i.s_id and i.w_id>100 group by s.s_id) aa group by m;
Explain the distribution, especially the increase in its right part.
4.5.2.2 Average word rank in sentence
4.5.2.3 Sentences consisting of many low frequency words I
4.5.2.4 Sentences consisting of many low frequency words II
4.5.2.5 Sentences consisting of short words only I
4.5.2.6 Sentences consisting of short words only II
4.5.2.7 Sentences consisting of long words only I
4.5.2.8 Sentences consisting of long words only II